Does the Dispersion Parameter of Negative Binomial Models Truly Estimate the Level of Dispersion in Over-dispersed Crash data with a Long Tail?

نویسنده

  • Yajie Zou
چکیده

Despite many statistical models that have been proposed for modeling motor vehicle crashes, the most commonly used statistical tool remains the Negative binomial (NB) model. Crash data collected for safety studies may exhibit over-dispersion and a long tail (i.e., a few sites have unusually high number of crashes). However, some studies have shown that NB models cannot handle over-dispersed count data with a long tail adequately. So far, no work has investigated the performance of the dispersion parameter of the NB model when analyzing over-dispersed crash data with a long tail. The dispersion parameter of the NB model plays an important role in various types of transportation safety analysis. The first objective of this study is to examine whether the dispersion parameter can truly reflect the level of dispersion in over-dispersed crash data with a long tail. The second objective is to determine whether the dispersion term of the Sichel (SI) model can be used as an alternative to the dispersion parameter of the NB model. To accomplish the objectives of this study, 3,000 data sets are simulated from NB and SI regression models using different values describing the mean and the dispersion level. For the simulated data sets, the dispersion parameter and dispersion term are estimated and compared to the true values. To complement the output of the simulation study, crash data collected in Texas are also used to compare the dispersion parameter and dispersion term. The results from this study suggest that the dispersion parameter of the NB model can erroneously estimate the level of dispersion in over-dispersed count data with a long tail and the dispersion term of the SI model is more reliable in estimating the true level of dispersion. Thus, considering the findings in this study, it is believed that the dispersion term may offer a viable alternative for analyzing over-dispersed crash data with a long tail. Impact on Industry: The dispersion term of the SI model can be used to obtain reliable empirical Bayes (EB) estimates. The SI-based EB estimates can provide accurate hotspot identification results by ranking crash-prone sites for safety improvement programs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data.

Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB)...

متن کامل

Estimating the Dispersion Parameter of the Negative Binomial Distribution for Analyzing Crash Data Using a Bootstrapped Maximum Likelihood Method

The objective of this study is to improve the estimation of the dispersion parameter of the negative binomial distribution for modeling motor vehicle collisions. The negative binomial distribution is widely used to model count data such as traffic crash data, which often exhibit low sample mean values and small sample sizes. Under such situations, the most commonly used methods for estimating t...

متن کامل

Adjustment for the Maximum Likelihood Estimate of the Negative Binomial Dispersion Parameter

Negative Binomial (or Poisson-gamma) model has been used extensively by highway safety analysts because it can accommodate the over-dispersion, often exhibited in crash data. However, it has been reported in the literature that the maximum likelihood estimate of the dispersion parameter of NB models can be significantly affected when the data are characterized by small sample size and low sampl...

متن کامل

Growth Estimators and Confidence Intervals for the Mean of Negative Binomial Random Variables with Unknown Dispersion

The negative binomial distribution becomes highly skewed under extreme dispersion. Even at moderately large sample sizes, the sample mean exhibits a heavy right tail. The standard normal approximation often does not provide adequate inferences about the data’s expected value in this setting. In previous work, we have examined alternative methods of generating confidence intervals for the expect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014